Online Learning
This page contains resources about Online Learning and Sequential Prediction. Subfields and Concepts * Recursive Least Squares * Mini-Batch Learning ** Mini-Batch Gradient Descent Methods * Decision Theory * Information Theory ** Entropy ** Kullback-Leibler (KL) Divergence * Game Theory ** Minimax Theorem ** Blackwell's Approachability * Online Dictionary Learning * Online Algorithms ** Wake-Sleep Algorithm ** Auto-Encoding Variational Bayes (AEVB) Algorithm * Online Convex Optimization ** Regret Bound ** Bregman Divergence ** No-regret Learning ** Online Gradient Descent ** Online Subgradient Descent ** Mirror Descent ** Stochastic Gradient Descent (SGD) ** Mini-batch Gradient Descent Methods ** Follow The Regularized Leader (FTRL) ** Multi-Armed Bandit (MAB) ** Regularization *** L2-regularization / Tikhonov regularization / Ridge regression *** L1-regularization / Least absolute shrinkage and selection operator (LASSO) *** Matrix Regularization Online Courses Video Lectures * Online Learning with a Memory Harness by Shai Shalev-Shwartz - VideoLectures.NET * Trading Regret Rate for Computational Efficiency in Online Learning with Limited Feedback by Shai Shalev-Shwartz - VideoLectures.NET Lecture Notes * Statistical Learning Theory and Sequential Prediction by Alexander Rakhlin and Karthik Sridharan * Machine Learning Theory by Karthik Sridharan * Prediction and Learning: It's Only a Game by Jacob Abernethy * Learning Theory by Sham Kakade and Ambuj Tewari * Statistical Learning Theory by Prof. Dmitry Panchenko * Introduction to Machine Learning by Shai Shalev-Shwartz * Statistical Learning Theory by Maxim Raginsky * Introduction to Online Optimization by Sebastien Bubeck Books and Book Chapters * Hazan, E. (2015). Introduction to online convex optimization. Foundations and Trends® in Optimization, 2''(3-4), 157-325. * Theodoridis, S. (2015). "Chapter 8: Parameter Learning: A Convex Analytic Path". ''Machine Learning: A Bayesian and Optimization Perspective. Academic Press. * Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. * Sra, S., Nowozin, S., & Wright, S. J. (2012). Optimization for machine learning. MIT Press. * Hazan, E. (2011). "Chapter 10: The Convex Optimization Approach to Regret Minimization". Optimization for machine learning. MIT Press. * Shalev-Shwartz, S. (2011). Online Learning and Online Convex Optimization. Foundations and Trends®'' in Machine Learning'', 4''(2), 107-194. Scholarly Articles * Villa, S., Rosasco, L. & Poggio, T. (2013). On Learning, Complexity and Stability. ''arXiv preprint arXiv:1303.5976. * Arora, S., Hazan, E., & Kale, S. (2012). The Multiplicative Weights Update Method: A Meta-Algorithm and Applications. Theory of Computing, 8''(1), 121-164. * Shalev-Shwartz, S. (2011). Online learning and online convex optimization. ''Foundations and Trends® in Machine Learning, 4''(2), 107-194. * Abernethy, J., Bartlett, P. L., & Hazan, E. (2011). Blackwell Approachability and No-Regret Learning are Equivalent. In ''COLT (pp. 27-46). * Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689-696). ACM. * Ying, Y., & Pontil, M. (2008). Online gradient descent learning algorithms. Foundations of Computational Mathematics, 8''(5), 561-596. * Shalev-Shwartz, S. (2007). Online Learning: Theory, Algorithms, and Applications. PhD Dissertation, Hebrew University. * Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press. * Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ''Proceedings of the 20th International Conference on Machine Learning (pp. 928–936). * Dietterich, T. G. (2002). Machine learning for sequential data: A review. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 15-30). Springer Berlin Heidelberg. See also * Statistical Learning Theory * Probability Theory Other resources * Wiki for research in Online Prediction * How large should the batch size be for stochastic gradient descent? - Cross Validated Stackexchange * Should training samples randomly drawn for mini-batch training neural nets be drawn without replacement? - Cross Validated Stackexchange Category:Machine Learning